Search CORE

314 research outputs found

One tagger, many uses:Illustrating the power of ontologies in dictionary-based named entity recognition

Author: Jensen Lars Juhl
Publication venue
Publication date: 01/01/2016
Field of study

Copenhagen University Research Information System

CoCoScore:Context-aware co-occurrence scoring for text mining applications using distant supervision

Author: Jensen Lars Juhl
Junge Alexander
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2019
Field of study

Copenhagen University Research Information System

ProtFus:A Comprehensive Method Characterizing Protein-Protein Interactions of Fusion Proteins

Author: Frenkel-Morgenstern Milana
Gorohovski Alessandro
Jensen Lars Juhl
Tagore Somnath
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

Copenhagen University Research Information System

Dictionary construction and identification of possible adverse drug events in Danish clinical narrative text

Author: Brunak Søren
Eriksson Robert
Jensen Lars Juhl
Jensen Peter Bjødstrup
Pletscher-Frankild Sune
Publication venue: 'BMJ'
Publication date: 01/01/2013
Field of study

OBJECTIVE: Drugs have tremendous potential to cure and relieve disease, but the risk of unintended effects is always present. Healthcare providers increasingly record data in electronic patient records (EPRs), in which we aim to identify possible adverse events (AEs) and, specifically, possible adverse drug events (ADEs). MATERIALS AND METHODS: Based on the undesirable effects section from the summary of product characteristics (SPC) of 7446 drugs, we have built a Danish ADE dictionary. Starting from this dictionary we have developed a pipeline for identifying possible ADEs in unstructured clinical narrative text. We use a named entity recognition (NER) tagger to identify dictionary matches in the text and post-coordination rules to construct ADE compound terms. Finally, we apply post-processing rules and filters to handle, for example, negations and sentences about subjects other than the patient. Moreover, this method allows synonyms to be identified and anatomical location descriptions can be merged to allow appropriate grouping of effects in the same location. RESULTS: The method identified 1 970 731 (35 477 unique) possible ADEs in a large corpus of 6011 psychiatric hospital patient records. Validation was performed through manual inspection of possible ADEs, resulting in precision of 89% and recall of 75%. DISCUSSION: The presented dictionary-building method could be used to construct other ADE dictionaries. The complication of compound words in Germanic languages was addressed. Additionally, the synonym and anatomical location collapse improve the method. CONCLUSIONS: The developed dictionary and method can be used to identify possible ADEs in Danish clinical narratives

Crossref

Copenhagen University Research Information System

PubMed Central

Online Research Database In Technology

Circular reasoning rather than cyclic expression

Author: Bork Peer
Brunak Søren
de Lichtenberg Ulrik
Jensen Lars Juhl
Jensen Thomas Skøt
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

A response to Combined analysis reveals a core set of cycling genes by Y Lu, S Mahony, PV Benos, R Rosenfeld, I Simon, LL Breeden and Z Bar-Joseph. Genome Biol 2007, 8:R146

Crossref

PubMed Central

Copenhagen University Research Information System

MDC Repository

Online Research Database In Technology

TISSUES 2.0:an integrative web resource on mammalian tissue expression

Author: Gorodkin Jan
Jensen Lars Juhl
Palasca Oana
Santos Alberto
Stolte Christian
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2018
Field of study

Copenhagen University Research Information System

Darkness in the Human Gene and Protein Function Space:Widely Modest or Absent Illumination by the Life Science Literature and the Trend for Fewer Protein Function Discoveries Since 2000

Author: Eisenhaber Birgit
Eisenhaber Frank
Jensen Lars Juhl
Kalbuaji Bharata
Sinha Swati
Publication venue: 'Wiley'
Publication date: 01/01/2018
Field of study

The mentioning of gene names in the body of the scientific literature 1901–2017 and their fractional counting is used as a proxy to assess the level of biological function discovery. A literature score of one has been defined as full publication equivalent (FPE), the amount of literature necessary to achieve one publication solely dedicated to a gene. It has been found that less than 5000 human genes have each at least 100 FPEs in the available literature corpus. This group of elite genes (4817 protein‐coding genes, 119 non‐coding RNAs) attracts the overwhelming majority of the scientific literature about genes. Yet, thousands of proteins have never been mentioned at all, ≈2000 further proteins have not even one FPE of literature and, for ≈4600 additional proteins, the FPE count is below 10. The protein function discovery rate measured as numbers of proteins first mentioned or crossing a threshold of accumulated FPEs in a given year has grown until 2000 but is in decline thereafter. This drop is partially offset by function discoveries for non‐coding RNAs. The full human genome sequencing does not boost the function discovery rate. Since 2000, the fastest growing group in the literature is that with at least 500 FPEs per gene.ASTAR (Agency for Sci., Tech. and Research, S’pore)Published versio

Copenhagen University Research Information System

Protein-driven inference of miRNA-disease associations

Author: Gorodkin Jan
Jensen Lars Juhl
Mørk Søren
Palleja Albert
Pletscher-Frankild Sune
Publication venue: 'Oxford University Press (OUP)'
Publication date: 21/11/2013
Field of study

Motivation: MicroRNAs (miRNAs) are a highly abundant class of non-coding RNA genes involved in cellular regulation and thus also diseases. Despite miRNAs being important disease factors, miRNA–disease associations remain low in number and of variable reliability. Furthermore, existing databases and prediction methods do not explicitly facilitate forming hypotheses about the possible molecular causes of the association, thereby making the path to experimental follow-up longer. Results: Here we present miRPD in which miRNA–Protein–Disease associations are explicitly inferred. Besides linking miRNAs to diseases, it directly suggests the underlying proteins involved, which can be used to form hypotheses that can be experimentally tested. The inference of miRNAs and diseases is made by coupling known and predicted miRNA–protein associations with protein–disease associations text mined from the literature. We present scoring schemes that allow us to rank miRNA–disease associations inferred from both curated and predicted miRNA targets by reliability and thereby to create high- and medium-confidence sets of associations. Analyzing these, we find statistically significant enrichment for proteins involved in pathways related to cancer and type I diabetes mellitus, suggesting either a literature bias or a genuine biological trend. We show by example how the associations can be used to extract proteins for disease hypothesis. Availability and implementation: All datasets, software and a searchable Web site are available at http://mirpd.jensenlab.org. Contact: [email protected] or [email protected]

CiteSeerX

Copenhagen University Research Information System

PubMed Central

STITCH: interaction networks of chemicals and proteins

Author: Bork Peer
Campillos Monica
Jensen Lars Juhl
Kuhn Michael
von Mering Christian
Publication venue
Publication date: 02/08/2017
Field of study

The knowledge about interactions between proteins and small molecules is essential for the understanding of molecular and cellular functions. However, information on such interactions is widely dispersed across numerous databases and the literature. To facilitate access to this data, STITCH (‘search tool for interactions of chemicals') integrates information about interactions from metabolic pathways, crystal structures, binding experiments and drug-target relationships. Inferred information from phenotypic effects, text mining and chemical structure similarity is used to predict relations between chemicals. STITCH further allows exploring the network of chemical relations, also in the context of associated binding proteins. Each proposed interaction can be traced back to the original data sources. Our database contains interaction information for over 68 000 different chemicals, including 2200 drugs, and connects them to 1.5 million genes across 373 genomes and their interactions contained in the STRING database. STITCH is available at http://stitch.embl.d

RERO DOC Digital Library

STITCH 3: zooming in on protein-chemical interactions

Author: Bork Peer
Franceschini Andrea
Jensen Lars Juhl
Kuhn Michael
Szklarczyk Damian
von Mering Christian
Publication venue
Publication date: 02/08/2017
Field of study

To facilitate the study of interactions between proteins and chemicals, we have created STITCH, an aggregated database of interactions connecting over 300 000 chemicals and 2.6 million proteins from 1133 organisms. Compared to the previous version, the number of chemicals with interactions and the number of high-confidence interactions both increase 4-fold. The database can be accessed interactively through a web interface, displaying interactions in an integrated network view. It is also available for computational studies through downloadable files and an API. As an extension in the current version, we offer the option to switch between two levels of detail, namely whether stereoisomers of a given compound are shown as a merged entity or as separate entities. Separate display of stereoisomers is necessary, for example, for carbohydrates and chiral drugs. Combining the isomers increases the coverage, as interaction databases and publications found through text mining will often refer to compounds without specifying the stereoisomer. The database is accessible at http://stitch.embl.d

RERO DOC Digital Library